The use of gene ontology evidence codes in preventing classifier assessment bias

نویسندگان

  • Mark F. Rogers
  • Asa Ben-Hur
چکیده

MOTIVATION The biological community's reliance on computational annotations of protein function makes correct assessment of function prediction methods an issue of great importance. The fact that a large fraction of the annotations in current biological databases are based on computational methods can lead to bias in estimating the accuracy of function prediction methods. This can happen since predicting an annotation that was derived computationally in the first place is likely easier than predicting annotations that were derived experimentally, leading to over-optimistic classifier performance estimates. RESULTS We illustrate this phenomenon in a set of controlled experiments using a nearest neighbor classifier that uses PSI-BLAST similarity scores. Our results demonstrate that the source of Gene Ontology (GO) annotations used to assess a protein function predictor can have a highly significant influence on classifier accuracy: the average accuracy over four species and over GO terms in the biological process namespace increased from 0.72 to 0.87 when the classifier was given access to annotations that are assigned evidence codes that indicate a possible computational source, instead of experimentally determined annotations. Slightly smaller increases were observed in the other namespaces. In these comparisons the total number of annotations and their distribution across GO terms were kept the same. CONCLUSION In conclusion, taking into account GO evidence codes is required for reporting accuracy statistics that do not overestimate a model's performance, and is of particular importance for a fair comparison of classifiers that rely on different information sources. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Sensor-Based Scheme for Activity Recognition in Smart Homes using Dempster-Shafer Theory of Evidence

This paper proposes a scheme for activity recognition in sensor based smart homes using Dempster-Shafer theory of evidence. In this work, opinion owners and their belief masses are constructed from sensors and employed in a single-layered inference architecture. The belief masses are calculated using beta probability distribution function. The frames of opinion owners are derived automatically ...

متن کامل

Designing an Ontology for Knowledge Discovery in Iran’s Vaccine

Ontology is a requirement engineering product and the key to knowledge discovery. It includes the terminology to describe a set of facts, assumptions, and relations with which the detailed meanings of vocabularies among communities can be determined. This is a qualitative content analysis research. This study has made use of ontology for the first time to discover the knowledge of vaccine in Ir...

متن کامل

CAR-NK Cells: A Systematic Review of Emerging Alternative on Immunotherapy Against Leukemia

Background: Cancer is a public health emergency. It has a high mortality rate despite numerous studies on pharmaceutical therapies. Chimeric antigen receptor-natural killer (CAR-NK) cells are promising immunotherapy that could be used to treat cancer, especially leukemia. However, the evidence is still unclear. Thus, this systematic review aims to summarize the evidence regarding the use of CAR...

متن کامل

Framing Bias in the Interpretation of Quality Improvement Data: Evidence From an Experiment

Background A growing body of public management literature sheds light on potential shortcomings to quality improvement (QI) and performance management efforts. These challenges stem from heuristics individuals use when interpreting data. Evidence from studies of citizens suggests that individuals’ evaluation of data is influenced by the linguistic framing or context of that information an...

متن کامل

Assessment of Weighting Functions Used in Oppermann Codes in Polyphase Pulse Compression Radars

Polyphase is a common class of pulse compression waveforms in the radar systems. Oppermann code is one of the used codes with polyphone pattern. After compression, this code has little tolerant against Doppler shift in addition to its high side lobe level. This indicates that the use of Oppermann code is an unsuitable scheme to radars applications. This paper shows that the use of amplitude wei...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 25 9  شماره 

صفحات  -

تاریخ انتشار 2009